Acquisition and data collection is currently a very dynamic processes. In order to obtain from data useful information, when huge quantities of data, the processing of the data is not a trivial task. Cluster analysis is very helpful in this and the result of grouping the result of grouping allows us to comprehend the available information and look at it from a different perspective. In any case, we are not able to show the entire spectrum of issues related to data analysis. Therefore we limit our discussion to the analysis of clusters, then we describe the TCLUST algorithm. The authors of the algorithm are H. Fritz, L. A. García-Escudero, A. Mayo-Iscar (see Fritz et al. 2011, 2012). In the paper we present the pros and cons robust clustering algorithm, and we discuss the available functions in the package tclust. Then on the example of dataset of air pollutants emission in Krakow we try to evaluate the quality of robust clustering algorithm.
robust cluster analysis, tclust algorithm, air quality testing
Fritz H., García-Escudero L. A., Mayo-Iscar A., (2011), A Fast Algorithm for Robust Constrained Clustering, URL http://www.eio.uva.es/infor/personas/tclust_algorithm.pdf.
Fritz H., García-Escudero L. A., Mayo-Iscar A., (2012), tclust: An R Package for a Trimming Approach to Cluster Analysis, Journal of Statistical Software, 47 (12), 1–26.
García-Escudero L. A., Gordaliza A., Matrán C., Mayo-Iscar A., (2011), Exploring the Number of Groups in Robust Model-Based Clustering, Statistics and Computing, 21 (4), 585–599.
García-Escudero L. A., Gordaliza A., Matrán C., Mayo-Iscar A., (2008), A General Trimming Approach to Robust Cluster Analysis, The Annals of Statistics, 36 (3), 1324–1345.
Genton M. G., Lucas A., (2003), Comprehensive Definitions of Breakdown Points for Independent and Dependent Observations, Journal of the Royal Statistical Society Series B, 65, 81–84.
Jajuga K., (1993), Statystyczna analiza wielowymiarowa, PWN, Warszawa.
Kosiorowski D., Mielczarek D., Szlachtowska E., (2015), Clustering of Functional Objects in Energy Load Prediction Issues, w: Papież M., Śmiech S., (red.), Proceedings from 9th Professor Aleksander Zeliaś International Conference on Modelling and Forecasting of Socio-Economic Phenomena, Fundacja Uniwersytetu Ekonomicznego w Krakowie, 108–118.
Kosiorowski D., Zawadzki Z., (2014), DepthProc: An R Package for Robust Exploration of Multidimensional Economic Phenomena, http://arxiv.org/abs/1408.4542.
Kosiorowski D., (2008), Robust Classification and Clustering Based on the Projection Depth Function, w: Brito P., (red.), COMPSTAT 2008, Proceedings in Computational Statistics, Physica-Verlag, Heidelberg, 209–216.
Krzyśko M., Wołyński W., Górecki T., Skorzybut M., (2008), Systemy uczące się, WNT.
Maronna R. A., Martin R. D., Yohai V. J, (2006), Robust Statistics – Theory and Methods, John Wiley & Sons, Chichester.
Rocke D. M., Woodruf D. L., (2002), Computational Connections Between Robust Multivariate Analysis and Clustering, w: Härdle R. B., (red.), COMPSTAT 2002 Proceedings in Computational Statistics, 255–260.
Rousseeuw, P. J., Van Driessen K., (1999), A Fast Algorithm for the Minimum Covariance Determinant Estimator, Technometrics, 41 (3), 212–223.
Rousseeuw P. J., (1987), Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis, Journal of Computational and Applied Mathematics, 20 (1), 53–65.
Walesiak M., Gatnar E., (red.), (2009), Statystyczna analiza danych z wykorzystaniem programu R, PWN, Warszawa.